Cement pavement void detection algorithm based on GPR signal and continuous wavelet transform method

The dimension of the void area in pavement is crucial to its structural safety. However, there is no effective method to measure its geometric parameters. To address this issue, a void size extraction algorithm based on the continuous wavelet transform (CWT) method was proposed using ground-penetrating radar (GPR) signal. Firstly, the finite-difference time-domain (FDTD) method was used to investigate the GPR response of void areas with different shapes, sizes, and depths. Next, the GPR signal was processed using the CWT method, and a 3D image based on the CWT result was used to visualize the void area. Based on the differences between the void and normal pavement in the time and frequency domains, the signal with maximum energy from the CWT time–frequency result was extracted and combined to reconstruct the new B-scan image, where void areas have energy concentration phenomenon. Based on this, width and depth of void detection algorithm was proposed to recognize the void area. Finally, the detection algorithm was verified both in numerical model and physical lab model. The results indicated that the CWT time–frequency energy spectrum can be used to enhance the void feature, and the 3D CWT image can clearly visualize the void area with a highlighted energy area. After fully testing and validating in numerical and lab models, our proposed method achieved high accuracy in void width and depth detection, providing a precise method for estimating void dimension in pavement. This method can guide DOT departments to carry out pre-maintenance, thereby ensuring pavement safety.

new method for the interpretation of GPR data; however, it requires a large number of B-scan images for network training, and subjects to the problem of determining the boundary of target area.
A-scan, which is the basic signal unit in GPR system, contains target information such as depth, layer thickness and material dielectric property.Thus, the GPR signal can be used for identifying boundary of defect area.For example, we surveyed the bridge deck asphalt pavement with a high-frequency 2.3 GHz antenna, and 28 time and frequency domain statistical parameters from A-scan signal were extracted to train an ANN model to locate moisture damage area 15 .Similarly, 28 time and frequency domain parameters and PCA method were also adopted to classify void and normal pavement in our another work 16 .The joint time-frequency parameters provide a new view for identification of underground target, but the selection of sensitive feature parameters is affected by human experience.In contrast, the time-frequency domain transform method can obtain richer information than previous statistical parameters.Dai et al. used short-time Fourier transform (STFT) to improve the interpretation accuracy of GPR signals with a Hamming window based on an integer multiple of the radar wavelet length 17 .STFT subjects to the problem of determining the window width, which affects the signal resolution.
To overcome the shortcoming of STFT, Wu et al. used continuous wavelet transform (CWT) to analyse the GPR signals in the airport runway, and found that the void size could be determined according to the CWT energy 18 .He et al. used the S-transform spectral inversion optimization algorithm to process the GPR data of the airport roadway, and effectively estimated the thickness of the thin void layer 19 .In addition, Liu et al. adopted deconvolution method to extract thickness dimension, and proposed a sparse pulse deconvolution algorithm to extract reflection coefficient sequences, which effectively improved the longitudinal resolution of GPR data 20 .Zhao et al. used the L-curve method to determine the parameters of regularized deconvolution, and predicted the thickness of thin asphalt 21 .Jazayeri et al. proposed a sparse blind deconvolution algorithm to improve the precision extraction of geometric parameters in target regions 22,23 .Existing studies have shown that both the time-frequency statistical parameters and time-frequency transform of GPR signal can be used to estimate the thickness of the underground target area, but there is no research on using the joint time-frequency parameters to determine the geometrical parameters of the void area.
To address the above issue, a void width and depth extraction method was proposed by using CWT method.FDTD method was adopted to simulate the void area in different situations to get the ground truth of void and normal pavement signal.CWT was used to process the GPR signal, based on the energy difference between void and normal pavement in CWT results, a reconstruction energy spectrum method was proposed to estimate the width and depth of void areas.The void geometric parameter detection algorithm was tested and verified both in numerical and lab model.The research results, including void width and depth parameters, can be used to evaluate the bearing capacity and residual life of pavement structures, which is crucial for making premaintenance plan of pavement.

GPR signal processing principle F-K migration algorithm
According to the GPR imaging principle on a single object, hyperbolic curve will be formed on both sides of the target.However, the envelope lines are not the position of the object.The F-K migration algorithm can effectively focus the scattered energy and can eliminate and suppress the hyperbolic effect.Its principle is based on the fact that the electromagnetic (EM) wave emitted by the GPR will undergo multiple refractions and reflections underground.Each reflection point is used as a sub wave source that emits EM waves from zero time.The migration results can be obtained by inversion of the echo data.It uses the reflection source model to solve the wave Eq. (1), in which the reflection signal field is generated by the reflection at the position of the object.F-K migration method calculates the wave field when reflection occurs and the wave is still at the reflection source t = 0.The essence of F-K migration is Fourier transform, which is derived from the general summation expression of wave function as Eq. ( 2) 24,25 .
where v is the propagation velocity of EM waves, k x and k z are wave-numbers in the x and z directions; ω = (v/2) k 2 x + k 2 y represents the frequency; and ϕ is the Fourier transform from the surface field ϕ(x, 0, t) .ϕ(x, 0, t) equals to the measured signal ϒ(x, t) when GPR acquires the waves propagated to the pavement surface.The target is estimated by the initial wave-field I(x, z) at t = 0 as following.
By resampling ϕ(k x , ω) to the k x − k z domain, the migration results can be calculated by the inverse fast Fourier transform F −1 .In our work, F-K migration and the following GPR signal process are all processed by home-made MATLAB program, and the parameter of propagation velocity can be calculated as following. (1) F-K migration is a linear function since the Fourier transform satisfies the linear super-position principle.The migration results of radargram can be regarded as the summation of migration from all Stationary wavelet transform (SWT) coefficients as following.
where M(•) represents the F-K migration function.The SWT coefficients have a good time-frequency resolution, and each component can contain clutter, targets, or hyperbola interference.Effective SWT signals containing the targets can enhance the migration results, while those occupied by clutter or hyperbola interference should be discarded.Thus, the final target profile is reconstructed by selected migration components.

Continuous wavelet transform
GPR signal belongs to transient non-stationary signal.The traditional Fourier transform cannot reveal the distribution of different frequency components in the time domain due to the limit characteristics of its basis function.CWT is a time-frequency localization analysis method with fixed window size and changeable shape, which captures the local and global characteristics of signals through different scales.CWT is often used for time-frequency analysis or transient signal location, especially for signals with sudden change of instantaneous frequency 24 , such as electrocardiogram (ECG) signal diagnose analysis.Continuous wavelet coefficient matrix is extracted from time domain to time-frequency energy spectrum analysis, which is adopted to enhance the characteristics of void area.According to our previous analysis, the GPR feature of the void and normal area in time domain and frequency domain are quite different 16 , usually, the amplitude in void area would be larger than that in normal pavement both in time and frequency domain.Therefore, when the GPR signal is converted from time domain to time-frequency domain, the signal from void area would present high energy than that of normal pavement, this could help us locate the void area.
The expression of CWT is as following.
where x(t) is the GPR signal and ψ a,b (t) is the mother wavelet.a and b are the scale and translation parameters of the mother wavelet, respectively.Therefore, wavelet transform is a multi-resolution analysis method that projects the signal onto a series of wavelet basis functions generated by the mother wavelet.
According to the inner product theorem of wavelet transform, the weighted sum of the amplitude squared of wavelet transform in time and frequency domain is equal to the total energy of the signal in time domain, which was described in Eq. (8).Therefore, the amplitude square of wavelet transform can be used as another form of time-frequency distribution of signal energy.
Considering a good balance between localization in time and frequency, and extracting more feature information, the mother wavelet adopts nonorthogonal complex Morlet (or amor wavelet named in MATLAB) wavelet, which is a Gaussian function with zero mean and modulated by complex cosine.The waveform of mother wavelet has the characteristics of energy attenuation and local convex peaks, which can achieve a good matching to the GPR Ricker reflection wavelet.The time domain analytic function of the mother wavelet is defined as follows.

Reconstructed energy spectrum
The GPR uses spherical wave to transmit and receive signal, therefore, GPR signal would contain many clutters.To improve the signal-to-noise ratio (SNR) of GPR signal and enhance the characteristics of void signal, dedicated GPR signal processing methods were employed, including static correction, energy gain, background removal, band-pass filtering, and F-K migration method.
According to the GPR propagation principle, if the EM wave meets the reflecting surface with dielectric difference during the propagation process, EM wave will be reflected.The larger the reflection index is, the stronger the energy of the reflected wave is.Ricker wavelet has the characteristics of local energy attenuation.The energy www.nature.com/scientificreports/and frequency of the reflected wavelet are closely related to the size of the target area and the difference of the medium.
Due to the target area presents large amplitude or high energy phenomena, we proposed a CWT energy reconstructed method by extracting the single frequency signal information corresponding to the maximum energy in the CWT result.The processing process is shown in Fig. 1.Taking the i-th A-scan x i (t) from processed GPR data, CWT method was used to transform it from time domain to time-frequency domain, where its wavelet coefficient spectrum is G i f , t .If this GPR signal contains a target, we will obtain a maximum value from CWT results denoted as G i max at the target depth area.The frequency component f i m corresponding to maximum G i max is taken as the main frequency from CWT result and was extracted as new A-scan x i (t) .Repeating the recon- struction process for the all the traces in GPR data, we got a new reconstructed signal B-scan B(x, t) , which is the stacking of all new A-scan with maximum frequency from CWT result.Compared new reconstructed B-scan with the original B-Scan in Fig. 1, one can observe that the reconstructed energy spectrum B(x, t) has higher resolution, which could effectively reveal the horizontal and vertical distribution of void areas area.

Energy function probability of the void area
3D visualization of the energy spectrum B(x, t) can effectively highlight target area and reveal the spatial energy distribution of the entire data.The 3D visualization of reconstruction B-scan was shown in Fig. 2, where void areas present local convex peak features.The convex peak is the energy concentration area, which can represent the location and width of the void area.To obtain the void dimension from reconstruction signal more easily, we reduce the dimension of the 2D matrix B(x, t) into 1D energy function y(x) .Thus, the local convex peak of the void becomes the local extreme point of the curve y, and the maximum points can be further used to determine the position of the void.The dimensionality reduction process could be expressed as where y(x i ) is the maximum value of energy spectrum at i-th trace x i , and N is the A-scan number in B-Scan.y(x i ) is the maximum value of i-th trace that makes up the maximum vector y(x) , which is described as following.
(11) y(x i ) = B max (x i , t) The smoothness of y(x) is affected due to noise signal.To smooth the curve, the Symlets 4 wavelet (sym4) is used for the signal y(x) and a 4-layer discrete wavelet transform (DWT) is performed.The decomposition process is shown in Fig. 3.The algebraic relationship between the original signal and the decomposed wavelet coefficients is shown in Eq. ( 13).
where cD i is the high-frequency wavelet coefficient, cA n is the low-frequency wavelet coefficient.n is the number of decomposition layers, the smaller n is, the lower the frequency band resolution is, and vice versa.
Figure 4 is the result curve after 4-layer decomposition by DWT method.Compared with the raw signal, the decomposed approximate signal γ (x) (or cA 4 ) does not affect the position of the extreme value point of curve y(x) and can make the curve smoother.Therefore, we could directly get the extreme points from signal γ (x) to determine void area.
In Fig. 4, decomposition signal γ (x) has three void areas.Each void area has convex peak feature and contain one peak area and two edges, i.e., the local minimum points f 1 and f 2 are the two edges of void area.If we want   to determine the void width, we only need to find the two local minimum points.However, due to the interference of background noise, the local minimum value may not be a real local minimum point of void area, which would cause width error.Therefore, it is necessary to set a threshold to determine whether this point is the local minimum point of void area or not.
Assuming that the energy distribution approximately of sample signal γ (x) obeys the lognormal distribution.The energy probability density function is shown in Eq. (14).Therefore, the expected value E(γ ) can be used as the threshold, which can be determined from Eq. ( 15).
where µ and σ is the mean value and standard deviation of logarithm of γ (x) , respectively.
Due to GPR signal has hyperbolic interference phenomena, it would cause pseudo void points between the two void areas, these points need to be filtered.In Fig. 5, there have multiple pseudo local points, whose value is y min with yellow line.We introduced a relative height factor α to determine whether this point is real or pseudo point, and it is defined as following.
where M 1 , M 2 are the two extreme points on the left and right of the local minima point of minimum value y min .When α > 0 , this local point is caused by hyperbolic interference and should be deleted, otherwise, we keep this point as edge of void aera.

Void width extraction algorithm
The void width extraction algorithm is shown in Fig. 6, and the extraction step is as following.
1. Obtain 1D dimensionality reduction energy function.B(x,t) is the reconstruct matrix with void area according to the flowchart of Fig. 1, and the energy distribution curve T(t) is calculated according to B(x,t).The 1D dimensionality reduction energy function y(x) was obtained according to Eqs. ( 10) and ( 11). 2. Find the threshold value.Estimate the lognormal distribution of each sample γ (x) and determine the thresh- old value K according to Eqs. ( 14) and ( 15). 3. Locate the void boundary point.Find all the maximum points of the function γ (x) and store their values to form a vector A N , N is the trace number.Set the vector value A j = 0 when A j > K. 4. Search the index of void boundary.Filter the interface points between two adjacent void area, search all the local minimum point ϕ(x) from A N and calculate their mean value as H mean ,if ϕ(x i ) > H mean , let ψ(i) = 0 .After this, the vector ψ(m) saves the index of extreme points from A , and m is the number of all extreme points.5. Determine the void width.According to the index of ψ(m) , find the two nearest minimum points on the left and right of the maximum point in the function y(x) to determine the width boundary.
CWT method is performed on the processed GPR data according to the process in Fig. 6, DWT with the sym4 wavelet function is used to obtain the curve γ (x) , which is shown in Fig. 7. Based on the extraction algorithm, we got the trace index of extreme point array ψ(x) .For example, we find the trace number of maximum points at Point 1, and two nearest local minimum points, denoted as f 1 , f 2 .The void width can be obtained from the corresponding track distance between f 1 and f 2 .To locate the two local minimum point, a threshold limit needs (14)   www.nature.com/scientificreports/ to be set to avoid overlapping.And the boundary points 4 and 5 should be deleted by adding constrained condition.Similarly, the width of void area near point 2 and 3 can also be determined by the same method.

Void depth extraction algorithm
Due to the edge of void area are not easy to locate due to the resolution of EM wave, here, we take the maximum energy position in depth (or sample point) as the upper depth of void area.Given that we have obtained the void width and stored in vector L m , which has been determined by width extraction algorithm.To improve the resolution in time axis, here, we adopted S-transform Gs i f , t to process the A-scan data in L m 26,27 , search all the maximum point in energy function g i (t) for each void trace and combine a depth vector g(t).
where g i (t n ) is the maximum energy value at time t n in i-th trace, g(t, i) is the depth vector for void area.
If there is a different dielectric layer underground, there would be a reflected Ricker wavelet, the first subwavelet peak of Ricker wavelet should be the upper depth of void area.Take one S-transform void signal for example, Fig. 8a shows the reflection ricker wave at void area, where point 1#, 2# and 3# are the typical peak point.Point 2# was used to refer the boundary of different media in A-scan signal.The upper depth could be determined by the velocity or dielectric constant (19).
where t = m • dt , m is the sample point index; dt is sample interval time of GPR, ε is the dielectric constant of concrete pavement.
Figure 8b is the absolute value from one void S-transform signal, all the values are converted to positive value.According to the Ricker wave feature, the upper boundary should be the second extreme value, therefore, the void depth could be determined by searching the peak wave points.

Numerical model
Cement pavement model with 7 rectangle air void areas were created and shown in Fig. 9, where all the void areas have the same dimension with 0.1 m × 0.1 m, while their depth ranges from 0.1 to 0.3 m.Simulation was processed in gprMax 3.15 and the simulation parameters are shown in Table 1.The central frequency of the GPR antenna is 800 MHz, the wavelength of the EM wave in the air void area is = 0.375 m.

CWT feature of void area
Due to the numerical result doesn't have noise, therefore, only F-K migration method was used to process the raw data and reduce the hyperbolic phenomenon.GPR signal of normal and void area from simulated model was used for comparison and shown in Fig. 10.It is obvious that the time frequency spectrum between normal and void area are quite different.The CWT result of normal signal is uniform without energy concentration, while the void CWT result has energy concentration area, the energy concentration area reflects the void position.(17)   www.nature.com/scientificreports/All the GPR data from simulated model was processed by CWT and further be combined in 4D CTW matrix.To view the result more easily, Paraview software was used to for 3D view of void area, where the colour represents the amplitude.Therefore, the CWT matrix was exported to VTK file from MATLAB to Paraview.Figure 11a shows the 3D visualization of the time-frequency spectrum of the void area.It can be seen that void areas from B-1 to B-7 are obvious.The deeper the void area, the less energy of CWT result.To view more clearly of void area, traces from 7 central void areas were extracted, and its CWT slice was shown in Fig. 11b, where the 7 void areas could be visualized.This indicates that the CWT can enhance the difference between void and normal pavement, however, it is not easy to determine the boundary of void area just from CWT result.
To view the void feature in 2D view, we reconstruct CWT result according to the flow chart in Fig. 1.The reconstructed energy spectrum is shown in Fig. 12.Compared with simulated model of Fig. 9 with the reconstructed spectrum of Fig. 12, one can easily observe the void area in CWT spectrum, and the reconstructed spectrum is more suitable to present the void shape than that in Fig. 11.
It can be clearly seen that void areas appear in concentration area in Fig. 12.The reason is that the echo energy of the normal signal is low and there is no reflected Ricker sub-echo, so the CWT values of normal area are small, thus, the normal pavement becomes the background noise, which could be filtered to enhance the void area.While the GPR signal appears strong echo in the void area, which highlights void areas in the time-frequency energy spectrum.In addition, the vertical distribution of the concentrated band of the reconstructed energy spectrum reflects the depth information of void area, while the horizontal distribution reflects the width information of the void area, later, we will test our proposed method for width and height detection.

Algorithm validation in numerical model
To calculate the void range, 4-layer DWT is used to eliminate the background noise interference for data in Fig. 12.The processed result is shown in Fig. 13.We got the decomposed approximate signal γ (x) , which is same to cA4.According to the width extraction algorithm, we firstly search all the extreme points, then filter the interface points by setting threshold value K. Finally, we get the boundary index and determine the void width according to f 1 − f 2 × dx , dx is the sample interval.To prevent the minimum point coincidence between the void area, the energy threshold is set as K = 7895.2according to the γ (x) sample.
After the void traces were determined, we used depth extraction algorithm to get the depth for void area.After the depth information was found, we overlapped the width and depth on the original B-scan, the result was shown in Fig. 14.The original Ricker wave is positive phase, thus the GPR wave in void area is in phase with the incident wave.In Fig. 14, almost all the void areas were correctly located at the white band area.However, there are two misjudgement area in B4 and B6 void area due to hyperbolic phenomena between adjacent void area, this interface could be filtered by adding weight function.The test result indicates that our proposed algorithm can be effectively for void area identification.www.nature.com/scientificreports/ We can not locate the depth range, we find the maximum energy point in time axis as void depth area, which is only the upper boudary of void area.Therefore, we only compare the width location accuracy.To evaluate the accuracy of our proposed method, the located void width was compared with the ground truth.Figure 15 shows width error, where the mean error is about 1.5%.Therefore, the proposed algorithm can effectively calculate the geometric parameters of void area, which could be used for further dynamics analysis in void area.

Algorithm validation in lab model
To validate our proposed method which is also suitable for different shape, here, we didn't stick to square void, instead, we adopted circle void.Lab model with 10 void areas shown in Fig. 16a was constructed by C30 cement with vertical dimension of 2.07 m × 0.4 m.Ten void holes with an interval of 15 cm are designed, the diameters of void areas are 100 mm, 90-30 mm, 25 mm and 16 mm in sequence, and the centre depth of holes is 20 cm from top.The USRADAR Subsurface GPR Radar Imaging System with 1 GHz antenna is used for surveying the model along the survey line as shown in Fig. 16 (a).A wood plate was used to support the antenna at the beginning to make sure hole 10 could be fully detected.This method would add noise to the GPR data, because the GPR wave is emitted and received with sphere wave, which would contain the signal of air.The sampling frequency is 16 GHz, the time window is 11.675 ns, and the sample points of each A-scan is 207.GPR data was processed by static correction, energy gain, background removal, band-pass filtering, and F-K migration method.
Figure shows the 3D visualization of reconstructed energy spectrum of lab model, where 9 void areas (from hole 2 to 10) were successfully detected and viewed.Because the size of hole 1 is too small for the sample information of 1 GHz, hole 1 is not viewed in original B-scan.Figure 16c    The area of energy concentration is reflected through chromaticity changes, which can effectively reveal the time-frequency domain characteristics of the void area.
Similar to numerical model, we also overlap the result with width and depth information on the original B-scan and the result was shown in Fig. 16c, where the 9 holes are correctly identified.To calculate the accuracy on lab model, Fig. 16d shows the detection width with groundtruth, the error is 4.2% on the lab model, Fig. 16e shows the width error bar chart of the void disease.The error of deeper voids is greater, but as the depth decreases, the error basically shows a decreasing trend.Therefore, the proposed method can effectively extract the void width and depth information from GPR signal, which could be useful for further pavement structure safety analysis.

Conclusions
This study proposes a method for extracting geometric parameters of void areas from GPR signals by using the CWT method.The CWT method is used to convert the GPR signal from the time domain to the time-frequency domain, where the void area exhibits energy concentration phenomena.By viewing the 3D CWT result, the void area can be clearly visulalized, and a width and depth extraction algorithm has been proposed and validated through both FDTD and lab experiments, achieving high accuracy in void width detection.This study can be summarised as follows: 1.There are obvious differences between the time-frequency spectra of void and normal areas in the CWT results, where void signal has energy concentration phenomena while normal pavement doesn't have.2. The 3D CWT result provides clear visualization of the void area, which could also be used for other underground targets.3. The proposed void width and depth extraction algorithm has been validated in both simulated and lab models, with the width detection accuracy for void areas being 1.5% and 4.2% for simulated and lab models, respectively.
In conclusion, this study presents a promising method for directly detecting the dimensions of void areas from GPR signals.However, it should be noted that this method relies on the difference in the CWT results between normal and void areas, and may not be valid if all test data comes from normal pavement without void area.Additionally, the depth information provided by this method is limited to the maximum energy depth layer of the void area due to the resolution limitations of the GPR antenna central frequency and void size.Further research is needed, including combining this method with machine learning techniques to select relevant void data, and integrating dynamics analysis of slab for determining the structural safety of void areas, which could be very useful for pavement maintenance.

Figure 8 .
Figure 8. Depth location in S-transform energy dimensionality reduction signal.(a) Ricker wave.(b) Absolute value of void S-transform signal.

Figure 9 .
Figure 9. Schematic diagram of the void model.
is the 3D CWT slice diagram of the lab model.The three coordinate axes are arrays corresponding to time, frequency and survey distance, respectively.

Figure 10 .Figure 13 .
Figure 10.CWT Comparison result between normal and void signal.(a) Comparison of normal and void signal in time domain.(b) CWT result of normal trace.(c) CWT result of void trace.

Figure 16 .
Figure 16.Geometric parameter extraction of void area in lab model.(a) The lab model.(b) The 3D reconstructed energy spectrum.(c) The CWT 3D model slice.(d) The depth and width detecting results.(e) Comparing results between detection width and ground truth.